Method of compressing neural network model and electronic apparatus for performing the same

ABSTRACT

Disclosed is a method of compressing a neural network model that is performed by a computing device. The method includes receiving a trained model and compression method instructions for compressing the trained model, identifying a compressible block and a non-compressible block among a plurality of blocks included in the trained model based on the compression method instructions, transmitting a command to a user device that causes the user device to: display a structure of the trained model representing a connection relationship between the plurality of blocks on a first screen such that the compressible block and the non-compressible block are visually distinguished, and display, on a second screen, an input field operable to receive a parameter value entered by a user for compression of the compressible block, and compressing the trained model based on the parameter value entered by the user in the input field.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.18/163,527, filed on Feb. 2, 2023, which claims priority from KoreanPatent Application No. 10-2022-0017230 filed in the Korean IntellectualProperty Office on Feb. 10, 2022, Korean Patent Application No.10-2022-0017231 filed in the Korean Intellectual Property Office on Feb.10, 2022, Korean Patent Application No. 10-2022-0023385 filed in theKorean Intellectual Property Office on Feb. 23, 2022, Korean PatentApplication No. 10-2022-0048201 filed in the Korean IntellectualProperty Office on Apr. 19, 2022, and Korean Patent Application No.10-2022-0057599 filed in the Korean Intellectual Property Office on May11, 2022, and Korean Patent Application No. 10-2022-0104355 filed in theKorean Intellectual Property Office on Aug. 19, 2022, the disclosures ofwhich are incorporated herein by reference.

BACKGROUND Field of the Invention

The present disclosure relates to a method of compressing a neuralnetwork model and an electronic apparatus for performing the same.

Discussion of Related Art

With the spread of artificial intelligence technology, the needs ofusers who need an artificial intelligence model to run an artificialintelligence model in a target device are increasing. Although variousartificial intelligence models are being released around the world, itis not easy for users to directly find an artificial intelligence modelthat has performance that they want. In addition, even ifusers findmodels with excellent performance such as a state-of-the-art (SOTA)model, the models are not necessarily operable on a target device. Forthis reason, users have trouble of checking whether the models can berun on the target device.

Accordingly, there is a need for a technology of allowing users toconveniently acquire a neural network model optimized for a targetdevice.

SUMMARY OF THE INVENTION

The present disclosure provides an electronic apparatus that provides aneural network model optimized for a target device.

The present disclosure also provides an electronic apparatus thatprovides a neural network model trained based on a data set input by auser.

The present disclosure also provides an electronic apparatus thatprovides a compressed neural network model trained based on acompression configuring value input by a user.

The present disclosure also provides an electronic apparatus thatprovides download data corresponding to a compressed neural networkmodel.

Objects of the present disclosure are not limited to the above-mentionedobjects. That is, other objects that are not described may be obviouslyunderstood by those skilled in the art to which the present disclosurepertains from the following description.

The present disclosure may provide a method of compressing a neuralnetwork model that is performed by a computing device, comprising:receiving, at a processor of the computing device, a trained model andcompression method instructions for compressing the trained model;identifying, via the processor, a compressible block and anon-compressible block among a plurality of blocks included in thetrained model based on the compression method instructions;transmitting, via a computer network, a command to a user device thatcauses the user device to: display a structure of the trained modelrepresenting a connection relationship between the plurality of blockson a first screen such that the compressible block and thenon-compressible block are visually distinguished, and display, on asecond screen, an interactive input field operable to receive aparameter value entered by a user for compression of the compressibleblock; and compressing the trained model based on the parameter valueentered by the user in the interactive input field.

When the compression method instructions configure a method of pruning,the identifying may comprise identifying among the plurality of blocks anon-compressible block which includes an activation function, anormalization function and an output channel that are directly connectedto an arithmetic operator.

When the compression method instructions configure a method of filterdecomposition, the identifying may comprise identifying among theplurality of blocks the compressible block which includes aconvolutional layer.

The structure of the trained model may be represented by a connectionbetween a plurality of user interface (UI) elements, each of theplurality of UI elements is associated with a respective one block ofthe plurality of blocks included in the trained model, each of theplurality of UI elements represents information on an associated blockof the plurality of blocks, and the information on the associated blockof the plurality of blocks may include identification information forthe associated block and latency data corresponding to the associatedblock.

The method may further comprise receiving information, via a computernetwork, about a target device on which the trained model is to beexecuted; and receiving a plurality of latency data from the targetdevice, wherein each latency data of the plurality of latency data maybe associated with a respective one block of the plurality of blocks.

When a user selects a first UI element that corresponds to thecompressible block and that is displayed on the first screen, the methodmay further comprise transmitting a command to the user device toactivate the interactive input field corresponding to the compressibleblock displayed on the second screen.

When a user selects a second UI element that corresponds to thenon-compressible block and that is displayed on the first screen, themethod may further comprise transmitting a command to the user device todisplay detailed information about the non-compressible block on thefirst screen, wherein the detailed information on the non-compressibleblock may include at least one of a quantity of channels or a size of akernel included in the non-compressible block.

The structure of the trained model may be a tree structure.

When a user selects a first UI element that corresponds to thecompressible block and that is displayed on the first screen, the methodmay further comprise transmitting a command to the user device todisplay detailed information on the compressible block on the firstscreen, wherein the detailed information on the compressible block mayinclude at least one of a quantity of channels or the size of a kernelincluded in the compressible block.

The first UI element may include a check box.

The present disclosure may provide an electronic apparatus forcompressing a neural network model, comprising: a communicationinterface, configured to send and receive data via a data network,including at least one communication circuit; a non-transitory memoryconfigured to store at least one operation instruction; and a processor,wherein execution of the at least one operation instruction causes theprocessor to: receive a trained model and compression methodinstructions for compressing the trained model; identify a compressibleblock and a non-compressible block among a plurality of blocks includedin the trained model based on the compression method instructions;transmit a command to a user device via the communication interface thatresults in the user device: displaying a structure of the trained modelrepresenting a connection relationship between the plurality of blockson a first screen such that the compressible block and thenon-compressible block are visually distinguished, and displaying aninteractive input field operable to receive a parameter value forcompression of the compressible block on a second screen; andcompressing the trained model based on the parameter value entered bythe user in the interactive input field.

When the compression method instructions correspond to a method ofpruning, the processor may be further configured to: identify among theplurality of blocks a non-compressible block which includes anactivation function, a normalization function, and an output channelthat are directly connected to an arithmetic operator.

When the compression method instructions correspond to a method offilter decomposition, the processor may be further configured to:identify among the plurality of blocks the compressible block whichincludes a convolutional layer.

The structure of the trained model may be represented as connectionsbetween a plurality of user interface (UI) elements, wherein each of theplurality of UI elements: may be associated with a respective one blockof the plurality of blocks included in the trained model, and representinformation regarding an associated block of the plurality of blocks,including identification information for the associated block, andlatency data corresponding to the associated block.

The processor may be further configured to: receive information about atarget device on which the trained model is to be executed; and receivea plurality of latency data from the target device, wherein each latencydata of the plurality of latency data may correspond to a respectiveblock of the plurality of blocks.

When a user selects a first UI element that corresponds to thecompressible block and that is displayed on the first screen, theprocessor may be further configured to cause the communication interfaceto transmit a command to the user device to activate the interactiveinput field corresponding to the compressible block displayed on thesecond screen.

When a user selects a second UI element that corresponds to thenon-compressible block and that is displayed on the first screen, theprocessor may be further configured to cause the communication interfaceto transmit a command to the user device to display detailed informationabout the non-compressible block on the first screen, and wherein thedetailed information on the non-compressible block may include at leastone of a quantity of channels or a size of a kernel included in thenon-compressible block.

When a user selects a first UI element that corresponds to thecompressible block and that is displayed on the first screen, theprocessor may be further configured to transmit a command to the userdevice to display detailed information on the compressible block on thefirst screen, and wherein the detailed information on the compressibleblock may include at least one of a quantity of channels or the size ofa kernel included in the compressible block.

Technical solutions of the present disclosure are not limited to theabovementioned solutions, and solutions that are not mentioned will beclearly understood by those skilled in the art to which the presentdisclosure pertains from the present specification and the accompanyingdrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects, features, and advantages of specific embodiments of the presentdisclosure will become more apparent from the following description withreference to the accompanying drawings:

FIG. 1 is a diagram showing an operation of an electronic apparatus inaccordance with embodiments of the present disclosure;

FIG. 2 is a diagram showing a first compression mode in accordance withembodiments of the present disclosure;

FIG. 3 is a diagram showing a compression setting screen of a firstcompression mode in accordance with embodiments of the presentdisclosure;

FIG. 4 is a diagram showing a second compression mode in accordance withembodiments of the present disclosure;

FIG. 5 is a diagram showing a compression setting screen of a secondcompression mode in accordance with embodiments of the presentdisclosure;

FIG. 6 is a diagram showing a screen for setting a block compressionconfiguring value in accordance with embodiments of the presentdisclosure;

FIG. 7 is a diagram showing a compression policy in accordance withembodiments of the present disclosure;

FIG. 8 is a flowchart showing a method of compressing a neural networkmodel in accordance with embodiments of the present disclosure;

FIG. 9 is a diagram showing a screen for setting a block compressionconfiguring value in accordance with embodiments of the presentdisclosure;

FIG. 10 is a diagram showing a screen for setting a block compressionconfiguring value in accordance with embodiments of the presentdisclosure; and

FIG. 11 is a diagram showing a screen for setting a block compressionconfiguring value in accordance with embodiments of the presentdisclosure.

FIG. 12 is a block diagram illustrating a configuration of theelectronic apparatus according to the embodiment of the disclosure.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Terms used in the present specification will be briefly described, andthen the present disclosure will be described in detail.

General terms that are currently widely used are selected as terms usedin embodiments of the present disclosure in consideration of functionsin the present disclosure, but may be changed depending on the intentionof those skilled in the art or a judicial precedent, the emergence of anew technique, and the like. In addition, in a specific case, termsarbitrarily chosen by an applicant may be used. In this case, themeaning of such terms will be mentioned in detail in a correspondingdescription portion of the present disclosure. Therefore, the terms usedin the present disclosure should be defined on the basis of the meaningof the terms and the contents throughout the present disclosure ratherthan simple names of the terms.

The present disclosure may be variously modified and have severalembodiments, and therefore specific embodiments of the presentdisclosure will be illustrated in the accompanying drawings and given indetail in the detailed description. However, it is to be understood thatthe present disclosure is not limited to specific exemplary embodiments,but includes all modifications, equivalents, and substitutions withoutdeparting from the scope and spirit of the present disclosure. When itis determined that a detailed description of the known art related tothe present disclosure may obscure the gist of the present disclosure,the detailed description will be omitted.

Terms “first,” “second,” and the like, may be used to describe variouscomponents, but the components are not to be construed as being limitedby these terms. The terms are used only to distinguish one componentfrom another component.

Singular forms are intended to include plural forms unless the contextclearly indicates otherwise. More specifically, as used herein and inthe appended claims, the singular forms “a,” “an,” “said,” and “the”include plural referents unless the context clearly dictates otherwise.It should be understood that terms “comprise” and “include” used in thepresent specification specify the presence of features, numerals, steps,operations, components, parts mentioned in the present specification, orcombinations thereof, but do not preclude the presence or addition ofone or more other features, numerals, steps, operations, components,parts, or combinations thereof.

Hereinafter, embodiments of the present disclosure will be described indetail with reference to the accompanying drawings so that those skilledin the art to which the present disclosure pertains may easily practicethe present disclosure. However, the present disclosure may be modifiedin various different forms, and is not limited to the embodimentsdescribed herein. In addition, in the drawings, portions unrelated tothe description will be omitted to obviously describe the disclosure,and similar reference numerals will be used to describe similar portionsthroughout the specification.

The details of embodiments set forth herein, both as to structure andoperation, are provided in the accompanying figures, in which likereference numerals refer to like or corresponding elements among thevarious views. The elements in the figures are not necessarily to scale,emphasis instead being placed upon illustrating the principles of theembodiments. Moreover, all illustrations are intended to conveyconcepts, where relative sizes, shapes and other detailed attributes maybe illustrated schematically rather than literally or precisely.

The present disclosure may provide a method for providing a neuralnetwork model that is performed by a computing device, comprising:receiving, at a processor of the computing device, a trained model thathas been trained based on a data set and a target device identified in adevice farm using information about the target device that has beeninputted by a user; compressing the trained model based on compressionconfiguring information and latency information received from the devicefarm; and providing download data corresponding to the compressedtrained model so that the compressed trained model is deployed on thetarget device.

The compression configuring information may include a first compressionmode indicating that the trained model is compressed based on a modelcompression configuring value that is configured by the user. When thefirst compression mode is configured, the compressing the trained modelmay comprise: identifying a plurality of compressible target blocksamong a plurality of blocks included in the trained model; deriving afirst set of compression parameters, including block compressionconfiguring values for block compression applied to a respective one ofthe plurality of target blocks, based on both the model compressionconfiguring value and a predefined algorithm; and compressing theplurality of compressible target blocks based on the first set ofcompression parameters.

The compressing the trained model may further comprise providing thefirst set of compression parameters to the user. When the blockcompression configuring values is modified by the user, the compressingthe trained model may further comprise compressing the plurality oftarget blocks based on a second set of compression parameters includingthe modified block compression configuring values.

The compression configuring information may include a second compressionmode indicating that information on a block included in the trainedmodel is provided and the trained model is compressed based on blockcompression configuring values configured by the user. When the secondcompression mode is configured, the compressing may comprise:identifying a plurality of compressible target blocks among a pluralityof blocks included in the trained model; providing information on theplurality of target blocks to the user; receiving a set of thirdcompression parameters including the block compression configuringvalues applied to a respective one of the plurality of target blocks,where the block compression configuring values have been configured bythe user for the compression of the plurality of target blocks; andcompressing the plurality of target blocks based on the set of thirdcompression parameters.

The information on the block included in the training model may includeat least one of identification information of the block, a latencycorresponding to the block, or a quantity of channels included in theblock.

The compressing the trained model may further comprise: receiving aplurality of latency data from the target device, wherein each latencydata of the plurality of latency data may be associated with arespective one block of the plurality of blocks, wherein each latencydata of the plurality of latency data may be acquired by executing anassociated block of the plurality of blocks by the target device.

The compression configuring information may include at least one ofcompression methods, compression configuring values, or referenceinformation for determining a compression target among a plurality ofchannels included in the trained model.

The method may further comprise: receiving, at the processor, a usercommand for retraining the compressed trained model; generating aretrained model based on the compressed trained model, and providingdownload data corresponding to the retrained model.

The method may further comprise: performing, at the processor, at leastone quantization or calibration operation on the compressed trainedmodel based on the information about the target device.

The present disclosure may provide an electronic apparatus for providinga neural network model, comprising: a communication interface,configured to send and receive data via a data network, including atleast one communication circuit; a memory configured to store at leastone operation instruction; and a processor, wherein execution of the atleast one operation instruction causes the processor to: receive atrained model that has been trained based on a data set and a targetdevice identified in a device farm using information about the targetdevice that has been inputted by a user; compress the trained modelbased on compression configuring information and latency informationreceived from the device farm; and provide download data correspondingto the compressed trained model so that the compressed trained model isdeployed on the target device.

The compression configuring information may include a compression modeindicating that the trained model is compressed based on a modelcompression configuring value that is configured by the user. When thefirst compression mode is configured, the processor may identify aplurality of compressible target blocks among a plurality of blocksincluded in the trained model, derive a first set of compressionparameters, including block compression configuring values for blockcompression applied to a respective one of the plurality of targetblocks based on both the model compression configuring value and apredefined algorithm, and compress the plurality of compressible targetblocks based on the first set of compression parameters.

The processor may provide the first set of compression parameters to theuser. When at least one of the block compression configuring values ismodified by the user, the processor may compress the plurality of targetblocks based on a second set of compression parameters including themodified at least one of the block compression configuring values.

The compression configuring information includes a second compressionmode indicating that information on a block included in the trainedmodel is provided. The trained model may be compressed based on blockcompression configuring values configured by the user. When the secondcompression mode is configured, the processor may identify a pluralityof compressible target blocks among a plurality of blocks included inthe trained model, provide information on the plurality of target blocksto the user, receive a set of third compression parameters including theblock compression configuring values applied to a respective one of theplurality of target blocks, where the block compression configuringvalues have been configured by the user for the compression of theplurality of target blocks, and compress the plurality of target blocksbased on the set of third compression parameters.

The information on the block included in the training module may includeat least one of identification information of the block, a latencycorresponding to the block, or a quantity of channels included in theblock.

The processor may receive a plurality of latency data from the targetdevice. Each latency data of the plurality of latency data may beassociated with a respective one block of the plurality of blocks. Eachlatency data may be acquired by executing an associated block of theplurality of blocks by the target device.

The compression configuring information may include at least one of acompression method, a compression configuring values, or referenceinformation for determining a compression target among a plurality ofchannels included in the trained model.

The processor may receive a user command for retraining the compressedtrained model, generate a retrained model based on the compressedtrained model, and provide download data corresponding to the retrainedmodel.

The processor may quantize or calibrate the compressed trained modelbased on the information about the target device.

The processor may determine a compression configuring value of thetrained model based on the latency information.

FIG. 1 is a diagram for describing an operation of an electronicapparatus according to an embodiment of the present disclosure.

Referring to FIG. 1 , the electronic apparatus 1200 may include a modelacquisition unit 110, a compression unit 120, and a launcher unit 130.The model acquisition unit 110, the compression unit 120, and thelauncher unit 130 may be implemented as a software module. The processor1230 may load and execute instructions related to each unit into thememory 1220.

The model acquisition unit 110 may acquire a trained model 115 based ona data set 101 and target device information 102 (or information on thetarget device). For example, The model acquisition unit 110 may performa first project to acquire a first trained model. The model acquisitionunit 110 may receive a compressed model 125 from the compression unit120. The model acquisition unit 110 may acquire a retrained model byperforming a third project configured based on the compressed model 125.

The model acquisition unit 110 may transmit the trained model 115 to thecompression unit 120 or the launcher unit 130. For example, The modelacquisition unit 110 may transmit the first trained model to thecompression unit 120. The model acquisition unit 110 may transmit theretrained model to the launcher unit 130. Other operations (e.g., anoperation of performing a project) of the electronic apparatus 1200related to The model acquisition unit 110 have been described above, anddetailed descriptions thereof will be omitted.

The compression unit 120 may output a lightweight model by performingcompression on the input model. The compression unit 120 may compressthe trained model 115 or a neural network model 135 to generate thecompressed model 125. The neural network model 135 may be apredetermined model that has not been acquired by The model acquisitionunit 110. The compression unit 120 may transmit the compressed model 125to the launcher unit 130 or The model acquisition unit 110.

The compression unit 120 may compress the input model based on thecompression configuring information configured by the user. Thecompression configuring information may include at least one of acompression mode, a compression method, a compression configuring value,or reference information for determining a compression target among aplurality of channels included in the input model. The compression modemay include a first compression mode for the compression of the inputmodel based on a model compression configuring value configured by auser for the compression of the input model. The compression mode mayinclude a second compression mode that provides information on a blockincluded in the input model to a user and compresses the trained modelbased on a block compression configuring value configured by the userfor the block compression.

The launcher unit 130 may output download data 145 corresponding to theinput model to be deployed on the target device. The model input to thelauncher unit 130 may include the compressed model 125, the neuralnetwork model 135, and a retrained model.

The launcher unit 130 may perform quantization on the input model basedon the target device information 102. The target device information 102may include a data type (e.g., an 8-bit integer type) supported by thetarget device. The launcher unit 130 may convert the data type of theinput model into a data type supported by the target device.

The launcher unit 130 may perform calibration on the input model. Thelauncher unit 130 may perform calibration based on a code input by auser or a pre-stored code. For example, the launcher unit 130 may adjusta quantization interval. The launcher unit 130 may perform quantizationbased on the adjusted quantization interval. Accordingly, parametervalues (e.g., weight values) of the input model or the quantized modelmay be changed.

The launcher unit 130 may provide the download data 145 to a user. Thedownload data 145 may mean a download file, a download package, orsimilar collection of data. When the user requests the download data145, the launcher unit 130 may transmit the download data 145 to theuser device. Accordingly, a neural network model optimized for thetarget device may be installed in the user device.

FIG. 2 is a diagram for describing a first compression mode according toan embodiment of the present disclosure. Each operation may be performedby The processor 1230.

Referring to FIG. 2 , the electronic apparatus 1200 may receive a modelcompression configuring value configured by a user for compression ofabase model (S210). For example, a model compression configuring valuemay include a value for determining a pruning ratio indicating a pruningdegree and the number of ranks. The base model may include a trainedmodel 115 and a neural network model 135 acquired by The modelacquisition unit 110.

The electronic apparatus 1200 may identify a plurality of compressibletarget blocks among a plurality of blocks included in the base model(S220). A block may be a layer set including at least one layer. A blockmay contain various types of layers. For example, a block may include aconvolution layer, an activation function, a regularization function,and an arithmetic operator (e.g., an addition operator or amultiplication operator).

The electronic apparatus 1200 may identify blocks other than a blockpredefined as non-compressible blocks as a target block. The blockpredefined as non-compressible may include a block including anactivation function or a normalization function. In addition, a blockpredefined as non-compressible may include a block in which an outputchannel is directly connected to an arithmetic operator. Here, the factthat the output channel is directly connected to the arithmetic operatormay mean that a block having a weight value does not exist between theoutput channel and the arithmetic operator. For example, a blockimmediately preceding the arithmetic operator may be a non-compressibleblock.

The electronic apparatus 1200 may derive a configuring value forcompression of a plurality of first blocks each being associated with arespective one of the plurality of target blocks based on a modelcompression configuring value and a predefined algorithm (S230). Thepredefined algorithm may include so-called layer-adaptive sparsity forthe magnitude-based pruning (LAMP) and variational Bayesian matrixfactorization (VBMF). The block compression configuring value mayinclude a pruning ratio indicating a pruning degree of an individualblock and the number of ranks. In the present disclosure, the modelcompression configuring value may mean a value corresponding to theentire model, and the model compression configuring value may mean avalue corresponding to an individual block included in the model.

The electronic apparatus 1200 may derive the block compressionconfiguring value based on the latency acquired from the device farm.For example, the electronic apparatus 1200 may acquire a greatercompression ratio to be applied to the block as the latencycorresponding to the block increases. Also, the electronic apparatus1200 may adjust a block compression configuring value acquired based onthe predefined algorithm using the latency acquired from the devicefarm.

The electronic apparatus 1200 may compress a plurality of target blocksbased on a configuring value for compression of a plurality of firstblocks (S240). Accordingly, the electronic apparatus 1200 may acquire acompressed model. For example, the electronic apparatus 1200 may performpruning on a plurality of target blocks. Alternatively, the electronicapparatus 1200 may perform filter decomposition (or tensordecomposition) on a plurality of target blocks.

The electronic apparatus 1200 may provide a configuring value forcompression of the plurality of first blocks to a user. For example, theelectronic apparatus 1200 may transmit, to a user device, a commandrelated to a display of a configuring value for compression of aplurality of first blocks and a configuring value for compression of aplurality of first blocks so that configuring values for compression ofa plurality of blocks may be displayed on the user device. Accordingly,the user device may display the configuring value for compression of theplurality of first blocks.

The user may modify at least one block compression configuring valueamong the configuring values for the compression of the plurality offirst blocks. The electronic apparatus 1200 may receive a user commandfor modifying a configuring value for compression of at least one firstblock from the user device. The electronic apparatus 1200 may compress aplurality of target blocks based on the user command.

In a first compression mode, a user may obtain a lightweight model byinputting only a configuring value for compression of a single model.Accordingly, user convenience may be improved. In another embodiment, auser may input a configuring value for compressing a plurality of modelseach being associated with a respective one of the plurality ofcompression methods. For example, a user may input a configuring valuefor first model compression corresponding to pruning and a configuringvalue for second model compression corresponding to filterdecomposition.

FIG. 3 is a compression setting screen of a first compression modeaccording to an embodiment of the present disclosure.

Referring to FIG. 3 , a compression setting screen 300 may include afirst region 310 for receiving a name of a compressed model, a secondregion 320 for receiving a user memo for compression, a third region 330for receiving a base model to be compressed, and a fourth region 340 forreceiving a model compression configuring value. The compression settingscreen 300 may be displayed on a user device.

The user device may transmit information input to the compressionsetting screen 300 to the electronic apparatus 1200. The electronicapparatus 1200 may acquire a configuring value for compression of aplurality of blocks corresponding to a plurality of target blocksincluded in the base model based on the information input to thecompression setting screen 300. The electronic apparatus 1200 mayidentify a model selected in the third region 330 as a base model. Thethird region 330 may be provided with a model list including the trainedmodel 115 and the neural network model 135 acquired by The modelacquisition unit 110. The electronic apparatus 1200 may acquire aplurality of compression ratios corresponding to a plurality of targetblocks based on a compression ratio configured by a user in the fourthregion 340.

The electronic apparatus 1200 may acquire a model compressionconfiguring value corresponding to a predetermined compression methodbased on a model compression configuring value configured by a user. Thepredetermined compression method may include pruning and/or filterdecomposition. For example, the electronic apparatus 1200 may acquire apruning ratio corresponding to a target block based on a compressionratio configured by a user. Alternatively, the electronic apparatus 1200may acquire the number of ranks corresponding to a target block based ona compression ratio configured by a user. The predetermined compressionmethod may be configured by a user.

The predetermined compression method may be plural. For example, theelectronic apparatus 1200 may acquire a pruning ratio and the number ofranks corresponding to a target block based on a compression ratioconfigured by a user. The electronic apparatus 1200 may perform thepruning and filter decomposition on the base model.

Meanwhile, a user may set the compression method and the modelcompression configuring value together. For example, the user may selectthe pruning as the compression method and input a pruning ratiocorresponding to the base model. In this case, the electronic apparatus1200 may acquire a pruning ratio corresponding to a target blockincluded in the base model based on the pruning ratio corresponding tothe base model.

Although not illustrated in FIG. 3 , the compression setting screen 300may include a compression method selection region for acquiring a usercommand for selecting a compression method. Alternatively, thecompression method selection region may be provided on a separatescreen.

FIG. 4 is a diagram for describing a second compression mode accordingto an embodiment of the present disclosure. Each operation may beperformed by The processor 1230.

Referring to FIG. 4 , the electronic apparatus 1200 may derive profileinformation of a base model by analyzing the base model (S410). Theprofile information of the base model may include information on eachblock included in the base model. Information on each block may includeidentification information of a block, a latency corresponding to theblock, the quantity of channels included in the block, and a size of akernel included in the block.

The electronic apparatus 1200 may provide profile information of a basemodel to a user (S420). The electronic apparatus 1200 may transmit theprofile information of the base model to a user device. The user devicemay display the profile information of the base model.

The electronic apparatus 1200 may receive a configuring value forcompression of a plurality of second blocks configured by a user forcompression of a plurality of target blocks included in the base model(S430). A configuring value for compression of a plurality of secondblocks may correspond to a plurality of target blocks, respectively.

The electronic apparatus 1200 may compress a plurality of target blocksbased on a configuring value for compression of a plurality of secondblocks (S440). For example, the electronic apparatus 1200 may performpruning or filter decomposition on a plurality of target blocks.Accordingly, the electronic apparatus 1200 may acquire a lightweightmodel.

FIG. 5 is a compression setting screen of a second compression modeaccording to an embodiment of the present disclosure.

Referring to FIG. 5 , a compression setting screen 500 may include afirst region 510 for receiving a name and a memo of a compressed model,a second region 520 for receiving a base model to be compressed, and athird region 530 for receiving a compression method. A description 531of the selected compression method may be displayed in the third region530.

The compression method may include pruning and filter decomposition. Thepruning may include a first type of pruning based on a criterion and asecond type of pruning based on an index configured by a user. Thefilter decomposition may include tucker decomposition and canonicalpolyadic (CP) decomposition. The compression setting screen 500 may bedisplayed on a user device. The user device may transmit userinput-related information input to the compression setting screen 500 tothe electronic apparatus 1200. The electronic apparatus 1200 may performcompression on a base model based on the base model and compressionmethod selected by the user.

FIG. 6 is a screen for setting a block compression configuring valueaccording to an embodiment of the present disclosure.

Referring to FIG. 6 , a screen 600 for setting a block compressionconfiguring value may include a first screen 610 on which information ona base model is displayed and a second screen 620 for receiving a blockcompression configuring value. The architecture of the base model may bedisplayed on the first screen 610. Also, the latency corresponding toeach block and the quantity of channels included in the model may bedisplayed on the first screen 610.

The user device may acquire a user input for setting a block compressionconfiguring value on the second screen 620. For example, the user devicemay acquire a configuring value (e.g., 0.5) for first block compressioncorresponding to the first block (block 1). The user device may transmita configuring value for first block compression to the electronicapparatus 1200. The electronic apparatus 1200 may compress the firstblock based on the configuring value for first block compression.

As such, in the second compression mode, the user may set a blockcompression configuring value desired for each block, and acquire acompressed model in which each block is compressed as much as desired.Accordingly, the user satisfaction may be improved.

Although not illustrated, a UI element for selecting a compressionpolicy may be displayed on the compression setting screen 500 or thescreen 600. The compression policy may mean a rule on how to performcompression. For example, when the compression method is pruning, thechannel to be pruned may vary according to the compression policy evenif the configuring value for compression is the same.

FIG. 7 is a diagram for describing a compression policy according to anembodiment of the present disclosure. Specifically, FIG. 7 illustratesnodes that are pruned for three compression policies.

Referring to FIG. 7 , a block may include a first layer 710 and a secondlayer 720. The first layer 710 may include a plurality of nodes N11,N12, N13, N14, and N15. The second layer 720 may include a plurality ofnodes N21, N22, N23, N24, and N25. The node N11 and the node N21 havethe same index. The node N12 and node N22 have the same index. The nodeN13 and node N23 have the same index. The node N14 and the node N24 havethe same index. The node N15 and the node N25 have the same index.

The number indicated on each node (or neuron) indicates the importanceof each node. For example, the importance of the node N11 is 0.08, andthe importance of the node N12 is 0.14. The indicated importance may bea normalized value. The electronic apparatus 1200 may calculate theimportance of each node based on the compression method selected by theuser. For example, when “L2 norm pruning” is selected in the thirdregion 530, the electronic apparatus 1200 may calculate the importanceof each node based on the L2 norm.

The electronic apparatus 1200 may determine a node to be pruned based onthe compression policy and the importance of each node. Hereinafter, apruning method according to various compression policies will bedescribed.

When the compression policy is a first policy (average), the electronicapparatus 1200 may identify two nodes in order of low importance in eachchannel. For example, the electronic apparatus 1200 may identify thenode N11 and the node N12 in the first layer 710. The electronicapparatus 1200 may identify the node N22 and the node N24 in the secondlayer 720. The electronic apparatus 1200 may calculate an average valueof the identified node and a node having the same index as theidentified node. For example, the electronic apparatus 1200 maycalculate an average value of the importance of the node N11 and theimportance of the node N21. In addition, the electronic apparatus 1200may calculate an average value of the importance of the node N12 and theimportance of the node N22. The electronic apparatus 1200 may prunenodes included in a node set having the smallest average value. Forexample, the electronic apparatus 1200 may prune the node N12 and thenode N21. Also, the electronic apparatus 1200 may prune the node N12 andthe node N22.

When the compression policy is a second policy (intersection), theelectronic apparatus 1200 may identify two nodes in order of lowimportance in each channel. For example, the electronic apparatus 1200may identify the node N11 and the node N12 in the first layer 710. Theelectronic apparatus 1200 may identify the node N22 and the node N24 inthe second layer 720. The electronic apparatus 1200 may prune nodeshaving the same index among the identified nodes. For example, theelectronic apparatus 1200 may prune the node N12 and the node N22.

When the compression policy is a third policy (union), the electronicapparatus 1200 may identify two nodes in order of low importance in eachchannel. For example, the electronic apparatus 1200 may identify thenode N11 and the node N12 in the first layer 710. The electronicapparatus 1200 may identify the node N22 and the node N24 in the secondlayer 720. The electronic apparatus 1200 may prune nodes having the sameindex as each of the identified nodes. For example, the electronicapparatus 1200 may prune the node N11 and the node N21 having the sameindex as the node N11. The electronic apparatus 1200 may prune the nodeN12 and the node N22. The electronic apparatus 1200 may prune the nodeN24 and the node N14 having the same index as the node N24.

Meanwhile, in FIG. 7 , the number of nodes identified in each channel istwo as an example, but the present disclosure is not limited thereto.For example, the electronic apparatus 1200 may identify three or morenodes in the order of low importance in each channel.

FIG. 8 is a flowchart illustrating a method of compressing a neuralnetwork model according to an embodiment of the present disclosure.

Referring to FIG. 8 , the electronic apparatus 1200 may receive atrained model and a compression method for compressing the trained model(S810). For example, the electronic apparatus 1200 may acquire thetrained model 115 based on the model acquisition unit 110.Alternatively, the electronic apparatus 1200 may acquire the neuralnetwork model 135.

The electronic apparatus 1200 may identify a compressible block and anon-compressible block among a plurality of blocks included in thetrained model based on the compression method (S820). In the presentdisclosure, the non-compressible block may include not only a block forwhich compression may not be performed, but also a block for whichcompression may be performed but performance of a compressed model issmaller than a threshold value when the compression is performed.

Depending on the compression method, the criteria for determiningwhether the trained model can be compressed may be different. Thecompression method may include pruning and filter decomposition.

When the compression method is pruning, the electronic apparatus 1200may identify a block in which an activation function, a normalizationfunction, and an output channel are directly connected to an arithmeticoperator as a non-compressible block. Here, the fact that the outputchannel is directly connected to the arithmetic operator may mean thatother blocks having a weight value do not exist between thecorresponding block and the arithmetic operator. For example, a thirdblock, a fourth block, and a fifth block may be sequentially connectedin series. The fourth block may be an activation function or anormalization function, and the fifth block may be an arithmeticoperator. In this case, the third block may be a “block of which theoutput channel is directly connected to the arithmetic operator.”Accordingly, the electronic apparatus 1200 may determine that the thirdblock is the non-compressible block.

When the compression method is filter decomposition, the electronicapparatus 1200 may identify a block including a convolutional layer asthe compressible block.

The electronic apparatus 1200 may display a structure of a trained modelrepresenting a connection relationship between a plurality of blocks ona first screen such that the compressible block and the non-compressibleblock are visually distinguished, and transmit a command to a userdevice to display an input field for receiving a configuring value forcompression of a compressible block on a second screen (S830). The userdevice may display the structure of the trained model on the firstscreen based on the command received from the electronic apparatus 1200.Also, the user device may display the input field for receiving theconfiguring value for compression of the compressible block on thesecond screen. The user device may simultaneously output the firstscreen and the second screen.

The structure of the trained model may represent a connectionrelationship between a plurality of UI elements each being associatedwith a respective one of the plurality of blocks included in the trainedmodel. The plurality of UI elements may each represent information onone of the plurality of blocks. The information on each of the pluralityof blocks may include identification information of each of theplurality of blocks and a plurality of latencies each being associatedwith a respective one of the plurality of blocks. For example, thestructure of the trained model may be expressed in a graph form in whicha plurality of UI elements are expressed as nodes.

Meanwhile, the electronic apparatus 1200 may acquire a plurality oflatencies each being associated with a respective one of the pluralityof blocks using a device farm including a target device on which thetrained model is to be executed. For example, when the target device isselected as the first device, the user device may transmit theinformation on the first device to the electronic apparatus 1200. Theelectronic apparatus 1200 may identify the first device in the devicefarm based on the information on the first device. The electronicapparatus 1200 may calculate the plurality of latencies each beingassociated with a respective one of the plurality of blocks by executingthe trained model in the first device.

The electronic apparatus 1200 may compress the trained model based onthe block compression configuring value entered by the user in the inputfield (S840). For example, the electronic apparatus 1200 may perform thepruning on the trained model based on the pruning ratio input by theuser.

Meanwhile, when the first UI element corresponding to the compressibleblock displayed on the first screen is selected, the electronicapparatus 1200 may transmit a command to the user device to activate theinput field corresponding to the compressible block displayed on thesecond screen. Accordingly, the user may input a configuring value forcompression into the activated input field. Also, when the first UIelement is selected, the electronic apparatus 1200 may transmit acommand to the user device to display detailed information on thecompressible block corresponding to the selected first UI element on thefirst screen.

When the second UI element corresponding to the non-compressible blockdisplayed on the first screen is selected, the electronic apparatus 1200may transmit a command to the user device to display detailedinformation on the non-compressible block on the first screen. Thedetailed information on the non-compressible block may include at leastone of the quantity of channels or a size of a kernel included in thenon-compressible block.

In FIG. 8 , it has been described that the user device displays thefirst screen and the second screen based on the command received fromthe electronic apparatus 1200. In another embodiment, the user devicemay display the first screen and the second screen based on a user inputwithout the command received from the electronic apparatus 1200. Forexample, when a user input for selecting a UI element corresponding tothe first compressible block displayed on the first screen is acquired,the user device may activate the first input field corresponding to thefirst block displayed on the second screen.

Hereinafter, the first screen and the second screen will be described indetail.

FIG. 9 is a screen for setting a block compression configuring valueaccording to an embodiment of the present disclosure. A screen 900 maybe displayed on the user device when the compression mode is a secondcompression mode. A user may input a block compression configuring valuecorresponding to a block included in a trained model to be compressedbased on the screen 900.

Referring to FIG. 9 , the screen 900 may include a first screen 910 anda second screen 920. The user device may display the structure of thetrained model on the first screen 910. For example, the structure of thetrained model may be a hierarchical structure in which a plurality of UIelements 911, 912, 913, 914, 915, 916, and 917 each being associatedwith a respective one of the plurality of blocks (add, conv1, conv2,relu, hardsigmoid, mul, and conv3) included in the trained model arerepresented by nodes. The structure of the trained model may represent aconnection relationship between a plurality of UI elements 911, 912,913, 914, 915, 916, and 917.

The user device may display the plurality of UI elements 911, 912, 913,914, 915, 916, and 917 on the first screen 910. Each of the plurality ofUI elements 911, 912, 913, 914, 915, 916, and 917 may indicateinformation on a corresponding block. For example, the first UI element911 corresponding to the first block (add) may include an indicator LI1indicating a latency corresponding to the first block (add). As such,when the latency corresponding to the block is displayed on the screen900, the user may refer to the displayed latency when determining ablock compression configuring value. That is, the configuring value forcompression of each block may be determined based on the latencycorresponding to each block. In addition, the user convenience may beimproved.

The user device may distinguish and display the compressible block andthe non-compressible block. In FIG. 9 , the electronic apparatus 1200may determine a first block (add), a sixth block (mul), and a seventhblock (conv3) as compressible blocks. The electronic apparatus 1200 maydetermine a second block (conv1), a third block (conv2), a fourth block(relu), and a fifth block (hardsigmoid) as non-compressible blocks.Specifically, since output channels of the second block (conv1) and thethird block (conv2) are directly connected to a sixth block (mul), whichis a multiplication operator, the second block (conv1) and the thirdblock (conv2) may be determined as the non-compressible blocks. Sincethe fourth block (relu) and the fifth block (hardsigmoid) are activationfunctions, it may be determined that the fourth block (relu) and thefifth block (hardsigmoid) are the non-compressible blocks.

For example, the UI elements 911, 916, and 917 corresponding to thecompressible blocks (add, mul, and conv3) may include check boxes CB1,CB6, and CB7. The UI elements 912, 913, 914, and 915 corresponding tothe non-compressible blocks (conv1, conv2, relu, and hardsigmoid) maynot include a check box. The UI elements 911, 916, and 917 may bedisplayed with better visibility than the UI elements 912, 913, 914, and915. For example, the UI elements 911, 916, and 917 may be displayedbrighter than the UI elements 912, 913, 914, and 915. Alternatively, theUI elements 911, 916, and 917 may be displayed with a solid line and theUI elements 912, 913, 914, and 915 may be displayed with a dotted line.

The user device may display the information on the compressible block onthe second screen 920. For example, the user device may indicate thequantity of output channels and names of each of the compressible blocks(add, mul, and conv3). The user device may display an input field forreceiving a configuring value for compression of the compressible block.Here, the configuring value for compression means the block compressionconfiguring value described above. For example, the user device maydisplay input fields IF1, IF2, and IF3 each being associated with arespective one of the compressible blocks (add, mul, and conv3). Theinput fields IF1, IF2, and IF3 may receive a pruning ratio. Also, theuser device may display check boxes CB11, CB12, and CB13 for selectingeach of the compressible blocks (add, mul, and conv3).

FIG. 10 is a screen for setting a block compression configuring valueaccording to an embodiment of the present disclosure.

Referring to FIG. 10 , the user device may display the second screen 920based on a user input acquired through the first screen 910. Forexample, the first UI element 911 or the first block (add) may beselected by the user. For example, a user may click the check box CB1.The user device may display a check mark in the check box CB11corresponding to the selected first block (add), and activate the firstinput field IF1. The selection of the first UI element 911 may bereleased. In this case, the user device may deactivate the first inputfield IF1.

The user device may display the first screen 910 based on the user inputacquired through the second screen 920. For example, when the check boxCB11 corresponding to the first block (add) is selected, the user devicemay display a check mark in the check box CB1 corresponding to the firstblock (add). When the selection of the check box CB11 is released, theuser device may remove the check mark displayed in the check box CB1.

FIG. 11 is a screen for setting a block compression configuring valueaccording to an embodiment of the present disclosure.

Referring to FIG. 11 , the user device may provide detailed informationrelated to a block selected by a user. The detailed information relatedto the block may include at least one of the quantity of channels, asize of a kernel, a stride, or latency included in the block. Forexample, the seventh block (conv3) may be selected. In this case, theuser device may display detailed information 930 related to the seventhblock (conv3) on the first screen 910. Meanwhile, the user may select anon-compressible block. For example, the user may select the secondblock (conv1). In this case, the user device may display detailedinformation related to the second block (conv2) on the first screen 910.

Meanwhile, FIGS. 9 to 11 illustrate that the input field receives aratio value greater than 0 and less than or equal to 1 as a blockcompression configuring value. However, the present disclosure is notlimited thereto, and the range of the block compression configuringvalue may be variously changed according to the compression method. Forexample, when the compression method is a second type of pruning basedon an index, the input field may receive an index of a channel to bepruned. As another example, when the compression method is Tuckerdecomposition, the input field may receive the quantity of inputchannels of the core tensor and the quantity of output channels of thecore tensor.

Meanwhile, a block compression configuring value may be input by a useror may be determined by the electronic apparatus 1200. For example, theelectronic apparatus 1200 may configure the compression ratio of eachblock based on the latency corresponding to each block. The electronicapparatus 1200 may configure the compression rate of the block to behigher as the latency corresponding to the block increases. Referring toFIG. 9 , the compression ratio corresponding to the first block (add)may be smaller than the compression ratio corresponding to the sixthblock (mul).

FIG. 12 is a block diagram illustrating a configuration of theelectronic apparatus according to the embodiment of the disclosure.

Referring to FIG. 12 , the electronic apparatus 1200 may include acommunication interface 1210, a memory 1220, and a processor 1230. Forexample, the electronic apparatus 1200 may be implemented as a physicalserver or a cloud server.

The communication interface 1210 includes at least one communicationcircuit and may communicate with various types of external devices. Forexample, the communication interface 1210 may receive information on adata set and a target device from an external device. The externaldevice may be a user device. The user device may include personalcomputers and mobile devices. The communication interface 1210 maytransmit information on a plurality of base models retrieved based onthe information on the target device to the external device.Accordingly, the external device may output the information on theplurality of base models. The communication interface 1210 may receive auser command for selecting at least one of the plurality of base modelsfrom the external device.

The communication interface 1210 may transmit at least one selected basemodel and data set to an external server. The external server mayacquire a trained neural network model (or trained model) after trainingat least one base model selected using the data set. The communicationinterface 1210 may receive a trained model from the external server.

The communication interface 1210 may transmit the trained model to theexternal device. The communication interface 1210 may transmitinformation on the trained model to the external device. The informationon the trained model may include a name of the trained model, a taskperformed by the trained model, information on a target devicecorresponding to the trained model, and performance (e.g., accuracy andlatency) of the trained model. Meanwhile, in the present disclosure,acquiring/storing/transmitting/receiving a neural network model meansacquiring/storing/transmitting/receiving data (e.g., architecture,weight) related to a model.

The communication interface 1210 may include at least one of a Wi-Ficommunication module, a cellular communication module, a 3^(rd)generation (3G) mobile communication module, a 4^(th) generation (4G)mobile communication module, a 4^(th) generation long term evolution(LTE) communication module, a 5^(th) generation (5G) mobilecommunication, or wired Ethernet.

The memory 1220 may store an operating system (OS) for controlling anoverall operation of the components of the electronic apparatus 1200 andcommands or data related to the components of the electronic apparatus1200. The memory 1220 may be implemented as a non-volatile memory (e.g.,a hard disk, a solid state drive (SSD), and a flash memory), a volatilememory, or the like.

The memory 1220 may include a database (DB). For example, the memory1220 may include a data set DB for storing a data set. The memory 1220may include a project DB for storing a project. The memory 1220 mayinclude a model DB for storing the trained model. The information storedin the DB may be provided to a user. For example, a data set list, aproject list, and/or a model list may be displayed on an externaldevice.

The memory 1220 may store information on a plurality of neural networkmodels. For example, the memory 1220 may store a look-up table in whichidentification information of a plurality of neural network models,information on a target device, and performance information of aplurality of neural network models are matched. The performanceinformation of the plurality of neural network models may reflectperformance (e.g., latency) of each of the plurality of neural networkmodels when the neural network models are executed in the target device.The performance of the neural network model for the target device may bethe performance of the neural network model when the neural networkmodel is executed in the target device. The latency of the neuralnetwork model may be acquired from a device farm. The accuracy of theneural network model may be acquired using test data.

The memory 1220 may store a predefined algorithm for searching for thebase model. The predefined algorithm may include at least one of ahyper-parameter optimization (HPO) algorithm or a neural architecturesearch (NAS) algorithm. The hyper-parameter optimization algorithm mayinclude a tree-structured parzen estimator (TPE) algorithm. The TPEalgorithm may be based on Bayesian optimization. The neural networkarchitecture search algorithm may be based on an evolutionary algorithm.

The processor 1230 may be electrically connected to the memory 1220 tocontrol overall operations and functions of the electronic apparatus1200. The processor 1230 may control the electronic apparatus 1200 byexecuting instructions stored in the memory 1220.

The processor 1230 may acquire a trained model and a compression methodfor compressing the trained model. For example, the processor 1230 mayacquire the trained model 115 based on the model acquisition unit 110.Alternatively, the processor 1230 may acquire the neural network model135.

The processor 1230 may identify a compressible block and anon-compressible block among a plurality of blocks included in thetrained model based on the compression method.

Depending on the compression method, the criteria for determiningwhether the trained model can be compressed may be different. Thecompression method may include pruning and filter decomposition.

When the compression method is pruning, the processor 1230 may identifya block in which an activation function, a normalization function, andan output channel are directly connected to an arithmetic operator as anon-compressible block. Here, the fact that the output channel isdirectly connected to the arithmetic operator may mean that other blockshaving a weight value do not exist between the corresponding block andthe arithmetic operator. For example, a third block, a fourth block, anda fifth block may be sequentially connected in series. The fourth blockmay be an activation function or a normalization function, and the fifthblock may be an arithmetic operator. In this case, the third block maybe a “block of which the output channel is directly connected to thearithmetic operator.” Accordingly, the processor 1230 may determine thatthe third block is the non-compressible block.

When the compression method is filter decomposition, the processor 1230may identify a block including a convolutional layer as the compressibleblock.

A structure of a trained model may represent a connection relationshipbetween a plurality of blocks on a first screen such that thecompressible block and the non-compressible block are visuallydistinguished. The processor 1230 may control the communicationinterface 1210 to transmit a command to a user device to display aninput field for receiving a configuring value for compression of acompressible block on a second screen. The user device may display thestructure of the trained model on the first screen based on the commandreceived from the electronic apparatus 1200. Also, the user device maydisplay the input field for receiving the configuring value forcompression of the compressible block on the second screen. The userdevice may simultaneously output the first screen and the second screen.

The structure of the trained model may represent a connectionrelationship between a plurality of UI elements each being associatedwith a respective one of the plurality of blocks included in the trainedmodel. The plurality of UI elements may each represent information onone of the plurality of blocks. The information on each of the pluralityof blocks may include identification information of each of theplurality of blocks and a plurality of latencies each being associatedwith a respective one of the plurality of blocks. For example, thestructure of the trained model may be expressed in a graph form in whicha plurality of UI elements are expressed as nodes.

Meanwhile, the processor 1230 may acquire a plurality of latencies eachbeing associated with a respective one of the plurality of blocks usinga device farm including a target device on which the trained model is tobe executed. For example, when the target device is selected as thefirst device, the user device may transmit the information on the firstdevice to the electronic apparatus 1200. The processor 1230 may identifythe first device in the device farm based on the information on thefirst device. The processor 1230 may calculate the plurality oflatencies each being associated with a respective one of the pluralityof blocks by executing the trained model in the first device.

The processor 1230 may compress the trained model based on the blockcompression configuring value input by the user in the input field. Forexample, the processor 1230 may perform the pruning on the trained modelbased on the pruning ratio input by the user.

Various exemplary embodiments of the present disclosure described abovemay be implemented in a computer or a computer readable recording mediumusing software, hardware, or a combination of software and hardware. Insome cases, embodiments described in the present disclosure may beimplemented as the processor itself. According to a softwareimplementation, embodiments such as procedures and functions describedin the disclosure may be implemented as separate software modules. Eachof the software modules may perform one or more functions and operationsdescribed in the disclosure.

Computer instructions for performing processing operations according tothe diverse embodiments of the disclosure described above may be storedin a non-transitory computer-readable medium. The computer instructionsstored in the non-transitory computer-readable medium allow a specificmachine to perform the processing operations according to the diverseembodiments described above when they are executed by a processor.

The non-transitory computer-readable medium is not a medium that storesdata for a while, such as a register, a cache, a memory, or the like,but is a medium that semi-permanently stores data and is readable by theapparatus. A specific example of the non-transitory computer-readablemedium may include a compact disk (CD), a digital versatile disk (DVD),a hard disk, a Blu-ray disk, a universal serial bus (USB), a memorycard, a read only memory (ROM), or the like.

The machine-readable storage medium may be provided in the form of anon-transitory storage medium. Here, the “non-transitory storage medium”means that the storage medium is a tangible device, and does not includea signal (for example, electromagnetic waves), and the term does notdistinguish between the case where data is stored semi-permanently on astorage medium and the case where data is temporarily stored thereon.For example, the “non-transitory storage medium” may include a buffer inwhich data is temporarily stored.

The methods according to the diverse embodiments disclosed in thedocument may be included and provided in a computer program product. Thecomputer program product may be traded as a product between a seller anda purchaser. The computer program product may be distributed in the formof a machine-readable storage medium (for example, compact disc readonly memory (CD-ROM)), or may be distributed (for example, download orupload) through an application store (for example, Play Store™) or maybe directly distributed (for example, download or upload) between twouser devices (for example, smart phones) online. In a case of the onlinedistribution, at least some of the computer program products (forexample, downloadable app) may be at least temporarily stored in amachine-readable storage medium such as a memory of a server of amanufacturer, a server of an application store, or a relay server or betemporarily created.

According to various embodiments of the present disclosure as describedabove, it is possible to provide a neural network model optimized for atarget device.

According to various embodiments of the present disclosure as describedabove, it is possible to provide a neural network model trained based ona data set input by a user.

According to various embodiments of the present disclosure as describedabove, it is possible to provide a compressed neural network model basedon a configuring value for compression input by a user.

According to various embodiments of the present disclosure as describedabove, it is possible to provide download data corresponding to thecompressed neural network model.

Accordingly, it is possible to improve user convenience andsatisfaction.

In many instances entities are described herein as being coupled toother entities. It should be understood that the terms “coupled” and“connected” (or any of their forms) are used interchangeably herein and,in both cases, are generic to the direct coupling of two entities(without any non-negligible (e.g., parasitic) intervening entities) andthe indirect coupling of two entities (with one or more non-negligibleintervening entities). Where entities are shown as being directlycoupled together, or described as coupled together without descriptionof any intervening entity, it should be understood that those entitiescan be indirectly coupled together as well unless the context clearlydictates otherwise.

It is contemplated that any optional feature of the inventive variationsdescribed may be set forth and claimed independently, or in combinationwith any one or more of the features described herein. It is furthernoted that the claims may be drafted to exclude any optional element foran embodiment. As such, this statement is intended to serve asantecedent basis for use of such exclusive terminology as “solely,”“only” and the like in connection with the recitation of claim elements,or use of a “negative” limitation. Unless defined otherwise herein, alltechnical and scientific terms used herein have the same meaning ascommonly understood by one of ordinary skill in the art to which thisinvention belongs. The breadth of the present invention is not to belimited by the subject specification, but rather only by the plainmeaning of the claim terms employed.

In addition, the effects that can be obtained or predicted byembodiments of the present disclosure have been disclosed directly orimplicitly in the detailed description of the embodiments of the presentdisclosure. For example, various effects predicted according to theembodiments of the present disclosure have been disclosed in theabove-described detailed description.

The embodiments described herein and the claims thereto are directed topatent eligible subject matter. These embodiments do not constituteabstract ideas for a myriad of reasons. One such reason is that anyclaim that provides for the ability of neural network optimization.These apparatuses and computer implemented methods allow fordetermination of a target device attributes and acquire and/or use aneural network model that is optimized for a target device and therebyconstitute an improvement to the functioning of the computer itself,which may otherwise run sub-optimized neural networks and thus qualifiesas “significantly more” than an abstract idea.

Other aspects, advantages, and prominent features of the presentdisclosure will become apparent to those skilled in the art from theabove detailed description which discloses various embodiments of thepresent disclosure taken in conjunction with the accompanying drawings.

Although the embodiments of the disclosure have been illustrated anddescribed hereinabove, the disclosure is not limited to theabove-described specific embodiments, but may be variously modified bythose skilled in the art to which the disclosure pertains withoutdeparting from the gist of the disclosure as disclosed in theaccompanying claims. These modifications should also be understood tofall within the scope and spirit of the disclosure.

What is claimed is:
 1. A method of controlling server for compressing aneural network model including: receiving, at a processor of the server,a compression mode from a user device, the compression mode includes afirst compression mode and a second compression mode; controlling, viathe processor, the user device to display a compression setting screenfor receiving a configuring value, the compression setting screen isdisplay differently according to the compression mode; and controlling,via the processor, the user device to outputting download data in whichthe neural network model is compressed based on the configuring value,wherein the neural network model includes a plurality of blocks, whereinthe configuring value of the first compression mode is a firstconfiguring value for configuring a compression ratio for an entireneural network model, wherein the configuring value of the secondcompression mode is a second configuring value for configuring acompression ratio for at least one block of the plurality of blocks. 2.The method of claim 1, wherein the compression setting screen of thefirst compression mode includes, a base model input region for receivinga base model to be compressed, and a configuring value input region forreceiving the first configuring value for configuring the compressionratio for the base model.
 3. The method of claim 1, wherein thecompression setting screen of the second compression mode includes, ablock display region for displaying information on a first block whichis at least one of the plurality of blocks being comprised a base modelto be compressed, and a configuring value input region for receiving thesecond configuring value for configuring the compression ratio for thefirst block.
 4. The method of claim 3, wherein the compression settingscreen of the second compression mode includes an architecture displayregion for displaying an architecture indicating a connectionrelationship between the plurality of blocks of the base model, whereinthe plurality of blocks of the base model includes a compressible blockand non-compressible block, and wherein the first block being displayedon the block display region is the compressible block.
 5. The method ofclaim 3, further including: controlling, via the processor, the userdevice to display a compression method setting screen prior to displaythe compression setting screen when the processor received the secondcompression mode, wherein the compression method setting screen includesa base model input region for receiving the base model and a compressionmethod setting region for receiving a compression method to be appliedto the base model, wherein at least a part of the information displayedon the compression setting screen of the second compression mode isdifferent according to the compression method set in the compressionmethod setting region.
 6. The method of claim 1, wherein the neuralnetwork model of the first compression mode is compressed using anindividual parameter which is the compression ratio for at least oneindividual block of the plurality of blocks, wherein the individualparameter is obtained based on the first configuring value.
 7. Themethod of claim 4, wherein the compressible block and non-compressibleblock are displayed visually distinguished on the compression settingscreen of the second compression mode.
 8. The method of claim 7, whereinthe compressible block is selectable by user, and the non-compressibleblock is non-selectable by the user.
 9. The method of claim 3, whereinthe second configuring value of the first block is calculated and inputby the processor based on user input, wherein the processor controls theuser device to be output the download data in which the neural networkmodel is compressed based on a modified second configuration value whenthe modified second configuration is receive from the user device afterthe second configuration value of the first block is displayed via theuser device.
 10. A server for compressing a neural network modelincluding: a communication interface, configured to communicate with auser device, including at least one communication circuit; and aprocessor connected to the communication interface, wherein theprocessor configured to, receiving, at a processor of the server, acompression mode from the user device, the compression mode includes afirst compression mode and a second compression mode, controlling, viathe processor, the user device to display a compression setting screenfor receiving a configuring value, the compression setting screen isdisplay differently according to the compression mode, and controlling,via the processor, the user device to outputting download data in whichthe neural network model is compressed based on the configuring value,wherein the neural network model includes a plurality of blocks, whereinthe configuring value of the first compression mode is a firstconfiguring value for configuring a compression ratio for an entireneural network model, wherein the configuring value of the secondcompression mode is a second configuring value for configuring acompression ratio for at least one block of the plurality of blocks. 11.The server of claim 10, wherein the compression setting screen of thefirst compression mode includes, a base model input region for receivinga base model to be compressed, and a configuring value input region forreceiving the first configuring value for configuring the compressionratio for the base model.
 12. The server of claim 10, wherein thecompression setting screen of the second compression mode includes, ablock display region for displaying information on a first block whichis at least one of the plurality of blocks being comprised a base modelto be compressed, and a configuring value input region for receiving thesecond configuring value for configuring the compression ratio for thefirst block.
 13. The server of claim 12, wherein the processor furtherconfigured to, controlling, via the processor, the user device todisplay a compression method setting screen prior to display thecompression setting screen when the processor received the secondcompression mode, wherein the compression method setting screen includesa base model input region for receiving the base model and a compressionmethod setting region for receiving a compression method to be appliedto the base model, wherein at least a part of the information displayedon the compression setting screen of the second compression mode isdifferent according to the compression method set in the compressionmethod setting region.
 14. The server of claim 10, wherein the neuralnetwork model of the first compression mode is compressed using anindividual parameter which is the compression ratio for at least oneindividual block of the plurality of blocks, wherein the individualparameter is obtained based on the first configuring value.
 15. Theserver of claim 12, wherein the second configuring value of the firstblock is calculated and input by the processor based on user input,wherein the processor controls the user device to be output the downloaddata in which the neural network model is compressed based on a modifiedsecond configuration value when the modified second configuration isreceive from the user device after the second configuration value of thefirst block is displayed via the user device.
 16. A method ofcontrolling a user device for compressing a neural network modelincluding: receiving, at a processor of the user device, a compressionmode from a user, the compression mode includes a first compression modeand a second compression mode; displaying, via the processor, acompression setting screen for receiving a configuring value, thecompression setting screen is display differently according to thecompression mode; and outputting, via the processor, download data inwhich the neural network model is compressed based on the configuringvalue, wherein the neural network model includes a plurality of blocks,wherein the configuring value of the first compression mode is a firstconfiguring value for configuring a compression ratio for an entireneural network model, wherein the configuring value of the secondcompression mode is a second configuring value for configuring acompression ratio for at least one block of the plurality of blocks. 17.The method of claim 16, wherein the compression setting screen of thefirst compression mode includes, a base model input region for receivinga base model to be compressed, and a configuring value input region forreceiving the first configuring value for configuring the compressionratio for the base model.
 18. The method of claim 16, wherein thecompression setting screen of the second compression mode includes, ablock display region for displaying information on a first block whichis at least one of the plurality of blocks being comprised a base modelto be compressed, and a configuring value input region for receiving thesecond configuring value for configuring the compression ratio for thefirst block.
 19. The method of claim 18, further including: displaying,via the processor, a compression method setting screen prior to displaythe compression setting screen when the processor received the secondcompression mode, wherein the compression method setting screen includesa base model input region for receiving the base model and a compressionmethod setting region for receiving a compression method to be appliedto the base model, wherein at least a part of the information displayedon the compression setting screen of the second compression mode isdifferent according to the compression method set in the compressionmethod setting region.
 20. The method of claim 16, wherein the neuralnetwork model of the first compression mode is compressed using anindividual parameter which is the compression ratio for at least oneindividual block of the plurality of blocks, wherein the individualparameter is obtained based on the first configuring value.